NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Heterogeneous Manycore In-Memory Computing Architectures

https://doi.org/10.1145/3676536.3697138

Ogbogu, Chukwufumnanya; Narang, Gaurav; Joardar, Biresh Kumar; Doppa, Janardhan Rao; Pande, Partha Pratim (October 2024, ACM)

Full Text Available
Data Pruning-enabled High Performance and Reliable Graph Neural Network Training on ReRAM-based Processing-in-Memory Accelerators

https://doi.org/10.1145/3656171

Ogbogu, Chukwufumnanya; Joardar, Biresh; Chakrabarty, Krishnendu; Doppa, Jana; Pande, Partha Pratim (September 2024, ACM Transactions on Design Automation of Electronic Systems)

Graph Neural Networks (GNNs) have achieved remarkable accuracy in cognitive tasks such as predictive analytics on graph-structured data. Hence, they have become very popular in diverse real-world applications. However, GNN training with large real-world graph datasets in edge-computing scenarios is both memory- and compute-intensive. Traditional computing platforms such as CPUs and GPUs do not provide the energy efficiency and low latency required in edge intelligence applications due to their limited memory bandwidth. Resistive random-access memory (ReRAM)-based processing-in-memory (PIM) architectures have been proposed as suitable candidates for accelerating AI applications at the edge, including GNN training. However, ReRAM-based PIM architectures suffer from low reliability due to their limited endurance, and low performance when they are used for GNN training in real-world scenarios with large graphs. In this work, we propose a learning-for-data-pruning framework, which leverages a trained Binary Graph Classifier (BGC) to reduce the size of the input data graph by pruning subgraphs early in the training process to accelerate the GNN training process on ReRAM-based architectures. The proposed light-weight BGC model reduces the amount of redundant information in input graph(s) to speed up the overall training process, improves the reliability of the ReRAM-based PIM accelerator, and reduces the overall training cost. This enables fast, energy-efficient, and reliable GNN training on ReRAM-based architectures. Our experimental results demonstrate that using this learning for data pruning framework, we can accelerate GNN training and improve the reliability of ReRAM-based PIM architectures by up to 1.6×, and reduce the overall training cost by 100× compared to state-of-the-art data pruning techniques.
more » « less
Full Text Available
HuNT: Exploiting Heterogeneous PIM Devices to Design a 3-D Manycore Architecture for DNN Training

https://doi.org/10.1109/TCAD.2024.3444708

Ogbogu, Chukwufumnanya; Narang, Gaurav; Joardar, Biresh Kumar; Doppa, Janardhan Rao; Chakrabarty, Krishnendu; Pande, Partha Pratim (November 2024, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

Full Text Available
TEFLON: Thermally Efficient Dataflow-aware 3D NoC for Accelerating CNN Inferencing on Manycore PIM Architectures

https://doi.org/10.1145/3665279

Narang, Gaurav; Ogbogu, Chukwufumnanya; Doppa, Janardhan_Rao; Pande, Partha_Pratim (August 2024, ACM Transactions on Embedded Computing Systems)

Resistive random-access memory (ReRAM)-based processing-in-memory (PIM) architectures are used extensively to accelerate inferencing/training with convolutional neural networks (CNNs). Three-dimensional (3D) integration is an enabling technology to integrate many PIM cores on a single chip. In this work, we propose the design of athermallyefficient dataflow-aware monolithic 3D (M3D)NoC architecture referred to asTEFLONto accelerate CNN inferencing without creating any thermal bottlenecks.TEFLONreduces the Energy-Delay-Product (EDP) by 42%, 46%, and 45% on an average compared to a conventional 3D mesh NoC for systems with 36-, 64-, and 100-PIM cores, respectively.TEFLONreduces the peak chip temperature by 25Kand improves the inference accuracy by up to 11% compared to sole performance-optimized SFC-based counterpart for inferencing with diverse deep CNN models using CIFAR-10/100 datasets on a 3D system with 100-PIM cores.
more » « less
Energy-Efficient ReRAM-Based ML Training via Mixed Pruning and Reconfigurable ADC

https://doi.org/10.1109/ISLPED58423.2023.10244258

Ogbogu, Chukwufumnanya; Soumen, Mohapatra; Joardar, Biresh Kumar; Doppa, Janardhan Rao; Heo, Deuk; Chakrabarty, Krishnendu; Pande, Partha Pratim (August 2023, IEEE)

Full Text Available

Search for: All records